https://dx.doi.org/10.1088/1475-7516/2023/02/014

Summary

  • Gaussian processes (GPs) have two components, mean function and kernel function.
  • Often the focus is put on the kernel function and the mean function assumed to be zero.
  • Arbitrary choice of mean function can lead to unphysical results.
  • Example: calculation of distance moduli of Type Ia supernovae under two cosmological models.
  • Marginalise over a family of mean functions to remove the effect of mean function choice.

GP Notation

For data \(\{(\boldsymbol{x}_i, y_i); i = 1, \ldots, N\}\) with \(\boldsymbol{x}_i \in \mathbb{R}^d\) and \(y_i \in \mathbb{R}\),

\[y_i = f(\boldsymbol{x}_i) + \varepsilon_i\]

We propose a underlying function,

\[f(\cdot) \sim\mathcal{MVN} \left( \mu(\boldsymbol{x};\boldsymbol\theta_\mu), k(\boldsymbol{x}, \boldsymbol{x'}; \boldsymbol{\theta}_k) \right)\]

where \(\mu(\cdot)\) is the mean function and \(k(\cdot)\) is the covariance kernel function, with hyperparameters \(\boldsymbol\theta_\mu\) and \(\boldsymbol\theta_k\), respectively.

Mean Function

If we were to take many realisations of a GP, the mean of these over the support would be the specified mean function.

For example, \(\mu(\boldsymbol{x}) = \boldsymbol{0}\), \(k(x, x') = \exp\{-\|x - x'\|^2\}\):

Conditional Distribution

\[ \left[ {\begin{array}{c} \boldsymbol{y}\\ \boldsymbol{f^*}\\ \end{array} } \right] \sim \mathcal{N} \left(\left[ \begin{array}{c} \boldsymbol{\mu}\\ \boldsymbol{\mu^*} \\ \end{array} \right], \left[ \begin{array}{cc} k(\boldsymbol{x}, \boldsymbol{x}) + C& k(\boldsymbol{x}, \boldsymbol{x^*})\\ k(\boldsymbol{x^*}, \boldsymbol{x}) & k(\boldsymbol{x^*}, \boldsymbol{x^*}) \\\end{array}\right] \right) \]

where \(\boldsymbol{x}\) are observed points whose values are \(\boldsymbol{y}\), and \(\boldsymbol{x^*}\) are target points with predicted values \(\boldsymbol{f^*}\).

The reconstruction \(\boldsymbol{f^*}\) is dependent on the choice of \(\boldsymbol{\mu^*}\).

We actually want the reconstruction conditioned on observations, \(\boldsymbol{y}\).

\[\boldsymbol{f^*} | \boldsymbol{y} \sim \mathcal{MVN}(\boldsymbol{\bar{f^*}}, \mathrm{Cov}(\boldsymbol{f^*}))\]

Chain Rule & Marginalisation

Joint probabilities can be expressed as conditional probabilities:

\[P(A, B) = P(A|B) \times P(B)\]

\[P(A,B | C) = P(A|B,C) \times P(B|C)\]

Variables can be “integrated out”:

\[P(X) = \int_y P(X, Y = y) dy\]

GP Regression

Marginal likelihood of conditional distribution

\[p(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\mu}, \boldsymbol{\theta}_k) = \int p(\boldsymbol{y}|\boldsymbol{f}, \boldsymbol{x}) \times p(\boldsymbol{f}|\boldsymbol{x}, \boldsymbol{\mu}, \boldsymbol{\theta}_k) \;d\boldsymbol{f}\]

Data

  • Pantheon SNIa compilation contains 1048 points upto \(z \simeq 2.3\).
  • Mock data based on Pantheon with added Gaussian white noise.
  • Two models for comparison
    • phenomenologically emergent dark energy (PEDE) model,
    • Chevellier-Polarski-Linder (CPL) parameterisation of DE equation of state

Method

Apply GP regression with

  1. Different mean functions
  • Zero
  • Best fit
  1. Different kernel functions
  • Squared Exponential
  • Matern 3/2
  1. Marginalise over different mean functions

Results: Zero mean function

Results: Best-fit mean function

Results: Best-fit mean function

Conclusions of Paper

  • Choice of mean function should not be arbitrary but carefully selected.
  • Don’t use zero mean function for non-stationary quantities.
  • Full marginalisation does not seem to be too sensitive to choice of family of mean functions.

Comments

  • Marginalisation is common practice, but model misspecification is still a risk.
  • There is a computational cost, especially for more complex families of functions.
    • use GPs for the mean function?!
  • Alternative ways to prevent “unphysical” reconstructions, e.g., constraints on priors, transformations, etc.